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Abstract 

Causality has been recently introduced in 
databases, to model, characterize and pos¬ 
sibly compute causes for query results (an¬ 
swers). Connections between query causality 
and consistency-based diagnosis and database re¬ 
pairs (wrt. integrity constrain violations) have 
been established in the literature. In this work 
we establish connections between query causal¬ 
ity and abductive diagnosis and the view-update 
problem. The unveiled relationships allow us to 
obtain new complexity results for query causality 
-the main focus of our work- and also for the two 
other areas. 


Causality is an important notion that appears at the foun¬ 
dations of many scientific disciplines, in the practice of 
technology, and also in our everyday life. Causality is un¬ 
avoidable to understand and manage uncertainty in data, 
information, knowledge, and theories. In data management 
in particular, there is a need to represent, characterize and 
compute the causes that explain why certain query results 
are obtained or not, or why natural semantic conditions, 
such as integrity constraints, are not satisfied. Causality 
can also be used to explain the contents of a view, i.e. of 
a predicate with virtual contents that is defined in terms of 
other physical, materialized relations (tables). 

In this work we concentrate on causality as defined for- 
and applied to relational databases. Most of the work on 
causality has been developed in the context of knowledge 
representation, and little has been said about causality in 
data management. Furthermore, in a world of big uncer¬ 
tain data, the necessity to understand the data beyond sim¬ 
ple query answering, introducing explanations in different 
forms, has become particularly relevant. 


The notion of causality -based explanation f or a query re¬ 
sult was introduced in (IMeliou et al.L l2010a ), on the basis 
of the deeper concept of actual causation^ Intuitively, a 
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In contrast with general causal claims, such as “smoking 


tuple (of constants) t is an actual cause for an answer a to 
a conjunctive query Q from a relational database instance 
D if there is a “contingent” subset of tuples T, accompany¬ 
ing t, such that, after removing F from D, removing t from 
D \ r causes a to switch from being an answer to being 
a non-answer (i.e. not being an answer). Usually, actual 
causes and contingent tuples are restricted to be among a 
pre-specified set of endogenous tuples, which are admissi¬ 
ble, possible candidates for causes, as opposed to exoge¬ 
nous tuples. 


A cause t may have different associated contingency sets F. 
Intuitively, the smaller they are the strongest is f as a cause 
(it need less company to undermine the query answer). So, 
some causes may be stronger than others. This idea is for¬ 
mally captured thr ough the notion of ca usal responsibility, 
and introduced in ( Meliou et al.L 2010a). It reflects the rel¬ 
ative degree of actual causality. In applications involving 
large data sets, it is crucial t o rank potent i al caus es accord¬ 
ing to their responsibilities ( Meliou et all 2010bllah . 


Furthermore, view-condition ed causality was proposed in 
( Meliou et al. . 201 Obi 2011 ) as a restricted form of query 
causality, to determine causes for a set of unexpected query 
results, but conditioned to the correctness of prior knowl¬ 
edge about some other set of results. 

Actual causation, as used in ( MelioueLaL, 2010al[9. 2011 ). 
can be traced back to ( Halnern & Pearl , 200 ll 2005h . which 
provides a model-based account of causation on the ba¬ 
sis of counterfact ual dependence^ Causal re sponsibility 
was introduced in Chockler & Haloed ( 20041) . to provide 
a graded, quantitative notion of causality when multiple 
causes may over-determine an outcome. 

Model-based diagnosis ( Struss . 20081 sec. 10.3), an area of 


causes cancer”, which refer some sort of related events, actual 
causation specifies a particular instantiation of a causal relation¬ 
ship, e.g., “Joe’s smoking is a cause for his cancer”. 

^As discussed in dSalimi & BertossiLl2015l) . some objections 
to the H alpern-Pearl model of causality and the corresponding 
changes dHalnernl . I20l4 [2015h do not affect results in the con¬ 
text of databases. 



















































knowledge representation, addresses the problem of, given 
the specification of a system in some logical formalism 
and a usually unexpected observation about the system, 
obtaining explanations for the observation, in the form of 
a diagnosis for the unintended behavior. Since this and 
causality are related to explanations, a hrst connect ion be¬ 
tween causality and consistency-based diagnosis (IReitei . 
1987 ), a form of model-based d iagnosis, was established in 
( Salimi & Bertossi 2014 , 2015 ): Causality and the respon¬ 
sibility problem can be formulated as consistency-based di- 
agnosis problems, w hich allowed to extend the results in 


(iMeliou et al.Ll2010ah . However, no precise connection has 


been est ablished so far between causality and abductiv e di¬ 
agnosis ( Console et al. . 1991 : Eiter & Gottlobl 1995 ). an¬ 
other form of model-based diagnosis. 

The dehnition of causality for query a nswers applies to 


monotone queries ( Meliou et al. . 2010allbl). However, all 


complexity and algorithm ic results in ( Meliou et all 201 Oat 
Salimi & BertossiLl2015h have been restricted to first-order 
(FO) monotone queries. Other important classes of mono¬ 
tone queries, such as Datalog queries ( Ceri et al. . 1989t 


Abiteboul et al. . 1995h . possibly with recursion, require 


further investigation. 

In ( Salimi & Bertossil 2015 ) connections were establishe d 
between query causality, database repairs (lBertossil201 ih . 
and consistency-based diagnosis. In particular, complexity 
results for several causality problems were obtained from 
the repair connection. In the line of this kind of research, 
in this work we unveil natural connections between ac¬ 
tual causation and abductive diagnosis, and also the view- 
update problem in databases (more on this latter connection 
later in the section). 

As opposed to consistency-based diagnoses, which is usu¬ 
ally practiced with FO specihcations, abductive diagnosis 
is commonly performed under a logic programming (FP) 
approach (i n the general sense of FP) to knowledge rep¬ 
resentation dPenecker & Kakasi 20021 : Eiter et all 1997 : 
Gottlob et all 2010b ). Since Datalog can be seen as a form 
of FP, we manage to extend and formulate the notion of 
query-answer causality to Datalog queries via the abduc¬ 
tive diagnosis connection, in this way extending causality 
to a new class of queries, e.g. recursive queries, and obtain¬ 
ing complexity results on causality for them. 

Abductive reasoning/diagnosis has been applied to the view 
update problem in d atabases ( Kakas & Mancarellal 1990t 
Console et all 1995 ). which is about characterizing and 
computing updates of physical database relations that give 
an account of (or have as result) the intended updates on 
views. The idea is that abductive diagnosis provides (ab¬ 
duces) the reasons for the desired view updates, and they 
are given as changes on base tables. 

In this work we also explore fruitfu l connections of causaT 


ity with this view-update problem (lAbiteboul et al.L Il995h . 


i.e. about updating a database through views. An impor¬ 
tant aspect of the problem is that one wants the base, source 
database, i.e. the base relations, to change in a minimally 
way while still producing the view updates. Put in differ¬ 
ent terms, it is an update propagation problem, from views 
to base relations. This classical and important problem in 
databases. 


The delete-propagation problem ( Buneman et all 2002 : 


Kimelfeld , 2012 ; Kimelfeld et all 2012 ) is a particular case 
of the view-update problem where only tuple deletions are 
allowed on/from the views. If the views are dehned by 
monotone queries, only database deletions can give an ac¬ 
count of view deletions. So, in this case, a minimal set 
(in some sense) of deletions from the base relations is ex¬ 
pected to be performed. This is “minimal source-side- 
effect” case. It is also possible to consider minimizing the 
side-effect on the view, which also requires that other tu- 
ples in the (virtual) vie w contents are not affected (deleted) 
( Buneman et al. . 2002 ). 


In this work we provide a precise connection between 
different variants of the delete-propagation problem and 
query causality. In particular, we show that the min¬ 
imal source-side-effect problem is related to the most- 
responsible c ause problem, which was formulated and in¬ 
vestigated in ( Salimi & Bertossil 20L5h : and also that the 
“minimal view side-effect problem” is related to view- 
conditioned causality we already mentioned above. 


The established connections between abductive diagnoses, 
query causality and delete-propagation problems allow us 
to adopt (and possibly adapt) established results for some 
of them for application to the others. In this way we obtain 
some new complexity results. 

More precisely, our main results are as follows 0 


1. We establish precise connections between causality 
for Datalog queries and abductive diagnosis. More 
precisely, we establish mutual characterizations of 
each in terms of the other, and computational reduc¬ 
tions, between actual causes for Datalog queries and 
abductive diagnosis from Datalog specihcations. 

We proht from these connections to obtain new al¬ 
gorithmic and complexity results for each of the two 
problems separately. 


(a) We characterize and obtain causes in terms of- 
and from abductive diagnoses. 

(b) We show that deciding tuple causality for Data¬ 
log queries, possibly recursive, is AP-complete 
in data. 

(c) We identify a class of Datalog queries for which 

^The possible connections between the areas and pro blems in 
this paper were suggested in dBertossi & SalimiL l2014h . but no 
precise results were formulated there. 








































































































deciding causality is tractable in combined com¬ 
plexity. 

2. We establish and profit from precise connections be¬ 
tween delete-propagation and causality. More pre¬ 
cisely, we show that; 

(a) Most-responsible causes and view-conditioned 
causes can be obtained from solutions to different 
variants of the delete-propagation problem and 
vice-versa. 

(b) Computing the size of the solution to a min¬ 
imum source-side-effect problem is hard for 

ppNP{log(n)) _ 

(c) Deciding weather an answer has a view- 
conditioned cause is A^f-complete. 


1.1 CAUSALITY AND RESPONSIBILITY 

In the rest of this work, unless otherwise stated, we will 
assume that a database instance D is split in two disjoint 
sets, D = D" U D^, where and denote the sets of 
endogenous and exogenous tuples, respectively; and Q is a 
monotone query. 

Definition 1.1. A tuple r € D" is a counterfactual cause 
for an answer d to Q in Z? if I? |= Q{d) and D \ {r} ^ 
Q{a). A tuple r € is an actual cause for a if there 
exists r C O", called a contingency set, such that t is a 
counterfactual cause for d in I? \ F. □ 

Causes{D, Q(a)) denotes the set of actual causes for d. 
This set is non-empty on the assumption that Q(d) is true in 
D. When the query Q is boolean. Causes {D, Q) contains 
the causes for the answer yes in D. 


(d) We can identify some new classes of queries for 
which computing minimum source-side-effect 
delete-propagation is tractable. 


1 PRELIMINARIES AND CAUSALITY 
DECISION PROBLEMS 


We consider relational database schemas of the form S = 
{UjV), where U is the possibly inhnite database domain 
and 7^ is a hnite set of database predicate^of hxed arities. 
A database instance D compatible with S can be seen as 
a hnite set of ground atomic formulas (in databases aka. 
atoms or tuples), of the form P(ci,..., Cn), where P G V 
has arity n, and the constants ci,..., c„ G U. 

A conjunctive query (CQ) is a formula Q{x) of the hrst- 
order (FO) language C{S) associated to S of the form 
3y(Pi(si) A ■ • • A Pm{sm)), where the Pi{si) are atomic 
formulas, i.e. Pi G P, and the Si are sequences of terms, 
i.e. variables or constants of U. The x in Q{x) shows all 
the free variables in the formula, i.e. those not appearing in 
y. A sequence c of constants is an answer to query Q{x) if 
D 1= Q[c], i.e. the query becomes true in D when the vari¬ 
ables are replaced by the corresponding constants in c. We 
denote the set of all answers to an open conjunctive query 
Q{x) with Q[D). 


A conjunctive query is boolean (a BCQ), if x is empty, i.e. 
the query is a sentence, in which case, it is true or false in 
D, denoted hy D \= Q and D ^ Q, respectively. When Q 
is a BCQ, or contains no free variables, Q{D) = {yes} if 
Q is true, and Q{D) = 0, otherwise. 


A query Q is monotone if for every two instances Di C 
D 2 , Q{Di) C Q(I? 2 ), i.e. the set of answers grows mono- 
tonically with the instance. For example, CQs and unions 
of CQ (UCQs) are monotone queries. Datalog queries 
( Ceri et all 19891; Abiteboul et al.L[r995h . although not FO, 
are also monotone (cf. Section fLTI for more details). 

"'As opposed to built-in predicates (e.g. f) that we assume do 
not appear, unless explicitly stated otherwise. 


The dehnition of query-answer causality can be applied 
without any conceptual changes to Datalog queries. In the 
case of a Datalog, the query Q{x) is a whole program 11 
that accesses an underlying extensional database E that is 
not part of the query. Program Ft contains a rule that de- 
hnes a top answer-collecting predicate Ans{x). Now, d is 
an answer to query Ft on P when Ft U P ^ Ans{d). Here, 
entailment (^) means that the RHS belongs to the minimal 
model of the LHS. A Datalog query is boolean if the top 
answer-predicate is propositional, say ans. In the case of 
Datalog, we sometimes use the notation Causes{E, n(d)) 
or Causes{E, H), depending on whether FI has a Ans{x) 
or ans as answer predicate, resp. 

Given a t G Causes{D, Q{d)), we collect all subset- 
minimal contingency sets associated with r: 

Cont(D,Q(d),r) := {A C D" | D \ A |= Q(d), 

D \ (A U {r}) ^ Q(d), and 
VA' C A, D N (A' U {r}) ^ Q{d)}. 

The responsibility of actual cause r for answer d, denoted 
PQ(a)(''’)’ (irpTi)’ where |F| is the size of the smallest 
contingency set for r. Responsibility can be extend to all 
tuples in I?" by setting their value to 0, and they are not 
actual causes for Q. 

Example 1.1. Consider a database D with relations 
Author{Name,Journal) and Journal(JName,Topic,#Paper), and 
contents as below; 


Author 

Name 

JName 


Joe 

TKDE 


John 

TKDE 


Tom 

TKDE 


John 

TODS 


Journal 

JName 

Topic 

#Paper 


TKDE 

XML 

30 


TKDE 

CUBE 

31 


TODS 

XML 

32 


Consider the conjunctive query; 

Q{Name, Topic ): ^Journal JName #Paper{Author{Name,JName) 
A Journaf JName,Topic,#Paper), (1) 


which has the following answers; 


QiD) 

Name 

Topic 


Joe 

XML 


Joe 

CUBE 


Tom 

XML 


Tom 

CUBE 


John 

XML 


John 

CUBE 




























Assume {John, XML) is an unexpected answer to Q, and 
we want to compute its causes assuming that all tuples are 
endogenous. 

It turns out that Author(John, TODS) is an actual cause, 
with contingency sets Fi = {Author(John, TKDE)} 
and T 2 ={Joumal(TKDE,XML, 32)}, because 

Author(John, TODS) is a counterfactual cause for an¬ 
swer (John, xml) in both of D \ Fi and Z? \ F 2 . 
Therefore, the responsibility of Author(John, TODS) is 

Likewise, Journal(TKDE, XML, 32), Author(John, TKDE), 
Journal(TODS,XML, 32) are actual causes for {John, XML) 
with responsibility 

Now, under the assumption that the tuples in Journal 
are the endogenous tuples, the only actual causes 
for answer {John, XML) are Author)John, TKDE) and 
Author(John, TODS). □ 


A Datalog query Q{x) is a whole program FI consisting 
of positive rules that accesses an underlying extensional 
database E that is not part of the query. Program Ft 
contains a rule that defines a top answer-collecting predi¬ 
cate Ans{x), by means of a rule of the form Ans{x) <— 
Pi(si),..., Pm{sm). Now, d is an answer to query Ft on 
E when Ft U P ^ Ans{d). Here, entailment (|=) means 
that the RHS belongs to the minimal model of the LHS. 
So, the extension Ans{D) of Ans in the minimal model of 
the program contains the answers to the query. 


A Datalog query is boolean if the top answer-predicate is 
propositional, say ans, i.e. defined by a rule of the form 
ans ■(— Pi(si),..., Pm{sm)- In this case, the query is true 
ifnuP ans, e quivalently^ if ans belongs to the minima l 


model of n U P (iCeri et al. 


V j It am 
l ll989t 


Abiteboul et al.Lll9^ ). 


present some problems and results that we use throughout 
this paper. The first is the causality problem, about decid¬ 
ing whether a tuple is an actual cause for a query answer. 

Definition 1.2. For a boolean monotone query Q, the 
causality decision problem (CDP) is (deciding about mem¬ 
bership of): 

CVV{Q) := {{D,t) I r e D", and r S 

Causes{D,Q)}. □ 


This problem is tractable for UCQs (ISalimi & Bertossi . 


2015h . The next is the responsibility problem, about de¬ 


ciding responsibility (above a given bound) of a tuple for a 
query result. 


Definition 1.3. For a boolean monotone query Q, the re¬ 
sponsibility decision problem (RDP) is (deciding about 
membership of): 

nVV{Q) = {{D,t,v)\t € D’^, u e {0} u 

{| I fc e N+}, D \= Q and Pq(t) > w}. □ 


This problem is AP-complete for UCQs 
( Salimi & Bertossi. 2015 ). but tractable for linear CQs 


(Meliou et al 


2010ah . Roughly speaking, a CQ is linear 


if its atoms can be ordered in a way that every variable 
appears in a continuous sequence of atoms that does not 
contain a self-join (i.e. a join involving the same predicate), 
e.g. Bxvpu(A{x)ASi {x, v)AS 2 (v, y)AR{y, u)AS 3 {y, z)) 
is linear, but not 3xyz{A{x) A B{y) A C{z) AW{x, y, z)), 
for which RDP is AP-complete. The class of CQs for 
which RDP is tractable can be extended to weakly linear^ 

The functional, non-decision version of RDP, about com¬ 
puting the responsibility, i.e. an optim ization problem, is 
comp lete for for UCQs ( Salimi & Bertossi . 

mi3- 


CQs can be expressed as Datalog queries, e.g. O becomes: 

AnsQ{Name, Topic) <— Author{Name,JName), 

Joumal{JName, Topic,#Paper). 


The definition of query-answer causality can be ap¬ 
plied without any conceptual changes to Datalog queries. 
In the case of Datalog, we sometimes use the nota¬ 
tion Causes{E,I\.{d)) or Causes{E,Ll), depending on 
whether 11 has a Ans{x) or ans as answer predicate, resp. 


In ( Meliou et all 1201^3) . causality for non-query answers 
is defined on basis of sets of potentially missing tuples that 
account for the missing answer. Computing actual causes 
and their responsibilities for non-answers becomes a rather 
simple variation of causes for answers. In this work we 
focus on causality for query answers. 


The complexity of the computational and decision prob- 
lems that arise in que r y causality have been inv estigated in 
( Meliou et al. . 201 Oat Salimi & Bertossil 2015h . Here we 


Finally, we have the problem of deciding weather a tuple is 
a most responsible cause: 

Definition 1.4. For a boolean monotone query Q, the most 
responsible cause decision problem (MRDP) is: 
JOtnCViQ) = {{D,t) I t e and 

0 < Pq{t) is a maximum for D}. □ 


For UCQs this problem is complete for pNP 0 ° 9 in)) 
(Salimi & Bertossi. 2015). 


1.2 VIEW-CONDITIONED CAUSALITY 


A form of conditional c ausality was informally introduced 
in ( Meliou et al. . 2010bl) . to characterize causes for a query 
answer that are conditioned by the other a nswers to the 
query . The notion was made precise in ((Meliou et al. 


201 Ih . in a more general, non-relational setting that in par¬ 


ticular includes the case of several queries. In them the no¬ 
tion of view-conditioned causality was used, and we adapt 


^Computing sizes of minimum contingency sets is reduced to 
the max-flow/min-cut problem in a network. 














































it in the following to the case of a single query, possibly 
with several answers. 

Consider an instance D = Z?" U D^, and a monotone 
query Q with Q{D) = {ai,... a„}. Fix an answer, say 
Ofe € Q{D), while the other answers will be used as a con¬ 
dition on Ofc’s causality. Intuitively, is somehow unex¬ 
pected, and we look for causes, by considering the other 
answers as “correct”. The latter assumption has, in tech¬ 
nical terms, the effect of reducing the spectrum of contin¬ 
gency sets, by keeping Q(IZ)’s extension hxed, as a view, 
modulo the answer at hand. 

Definition 1.5. (a) A tuple r G Z?" is called a view- 
conditioned counterfactual cause (VCC-cause) for an¬ 
swer dk to Q if D \ {t} ^ Q{dk) and D \ {t} |= 
Q{di), for i G {1,..., n} \ {k}. 

(b) A tuple T G D'^ is an view-conditioned actual cause 
(VC-cause) for dk if there exists a contingency set, F C 
U", such r is a VCC-cause for in Z? \ F. 

(c) vc-Causes{D, Q{dk)) denotes the set of all VC causes 

for dk ■ □ 

Intuitively, a tuple r is a VC-cause for dk if there is a con¬ 
tingent state of the database that entails all the answers to 
Q and r is a counterfactual cause for dk, but not for the 
rest of the answers. Obviously, VC-causes for dk are also 
actual causes, but not necessarily the other way around; 
vc-Causes{D,Q[ak)) Q Causes{D,Q{ak))- 
Example 1.2. (ex. o cont.) Consider the same instance 
D, query Q, and the answer {John, XML), which does not 
have any VC-cause. To see this, take for example, the tuple 
Author(John, TODS) that is an actual cause for {John, XML), 
with two contingency sets, Fi and r 2 . It is easy to verify 
that none of these contingency sets satisfies the condition 
in Definition 1 1.51 e.g. the original answer {John, CUBE) is 
not such anymore from D \ Fi. The same argument can 
be applied to all actual causes for {John, XML). □ 

This example shows that it makes sense to study the com¬ 
plexity of deciding whether a query answer has a VC-actual 
cause or not. 


Under the abductive approach to diagnosis ([Console et al. 


199ll:lEiter & GottlobllT^IPooleLI 199211 1^ it is com¬ 

mon that the system specification rather explicitly describes 
causality information, specially in action theories where the 
effects of actions are directly represented by Horn formu¬ 
las. By restricting the explanation formulas to the pred¬ 
icates describing primitive causes (action executions), an 
explanation formula which e ntails an observation gives also 
a cause for the observation (IDenecker & Kakasl 120021) . In 
this case, and is some sense, causali t y info rmation is im¬ 
posed by the system specifier (|Poolelll992h . 


In database causality we do not have, at least not initially, a 
system description^ but just a set of tuples. It is when we 
pose a query that we create something like a description, 
and the causal relationships between tuples are captured by 
the combination of atoms in the query. If the query is a 
Datalog query (in particular, a CQ), then we have a Horn 
specification too. 


In this section we will establish connections between ab¬ 
ductive diagnosis and database causality]^ For that, we have 
to be more precise about the kind of abduction problems we 
will consider. 


2.1 BACKGROUND ON DATALOG ABDUCTIVE 
DIAGNOSIS 

A Datalog abduction problem dEiter et al. , 1997 ) is of the 
form AV = {fl,E, Hyp, Ohs), where: (a) H is a set of 
Datalog rules, (b) Z7 is a set of ground atoms (the exten- 
sional database), whose predicates do not appear in heads 
of rules in H, (c) Hyp, the hypothesis, is a finite set of 
ground atoms, the abducible atoms in this caseJl and (d) 
Ohs, the observation, is a finite conjunction of ground 
atoms. As it is common, we will start with the assumption 
that n U ZZ U Hyp \= Ohs. 


The abduction problem is about computing a minimal A C 
Hyp (under certain minimality criterion), such that HUZ^U 
A \=- Ohs. More specifically; 

Definition 2.1. Consider a Datalog abduction problem 
AT = {\].,E,Hyp, Ohs) 


Definition 1.6. Eor a monotone query Q, the view- 
conditioned cause problem is (deciding about membership 
of); 

VCV{Q) = {{D,d) I d G Q{D) and 

vc-Causes{D, Q{dj) 7 ^ 0 }. □ 

2 CAUSALITY AND ABDUCTION 


In general logical terms, an abductive explanation of an ob¬ 
servation is a formula that, together with the background 
logical theory, entails the observation. So, one could see 
an abductive explanation as a cause for the observation. 
However, it has been argued that causes and abductive 
explanations are not nece ssarily the same ( Psillos . 19961; 
Denecker & Kaka^ 2002 ). 


(a) An abductive diagnosis (or simply, a solution) for AV 
is a subset-minimal A C Hyp, such that H U ZZ U A \=- 
Ohs. This requires that no proper subset of A has this 

^Having integrity constraints would go in that direction, but 
we are not considering th eir presence in this work. However, see 
dSalimi & BertossiLl2015L sec. 5) for a consistency-based diagno¬ 
sis connection. 

’in dSalimi & Bertossil 1201 5h we established such a connec¬ 
tion between another form of model-based diagnosis dStrussL 
l2008h . namely consistency-based diagnosis dReiteil Il987h . 
relationships and com parisons between con sistency-based and ab¬ 
ductive diagnosis see dConsole et alill99lh . 

*It is common to accept as hypothesis all the possible ground 
instantiations of abducible predicates. We assume abducible 
predicates do not appear in rule heads. 
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property. Sol{AV) denotes the set of abductive diag¬ 
noses for problem AV. 

(b) A hypothesis h G Hyp is relevant for AV if h con¬ 
tained in at least one diagnosis of AV. Rel{AV) col¬ 
lects all relevant hypothesis for AV. □ 

We are interested in deciding, for a fixed Catalog program, 
if an hypothesis is relevant or not, with all the data as input. 

More precisely, we consider the following decision prob¬ 
lem. 

Definition 2.2. Given a Catalog program If, the relevance 
decision problem (RLCP) for If is (deciding about the 
membership of): 

ncvv{n) = {{E, Hyp, Obs, h)\hG Rel{AV), with 

AV = {If, E, Hyp, Obs) and h G Hyp}. □ 



Figure 1: (a) HiD). (b) A tree decomposition of 7f(D). 


with h C A(n). (c) For every v G V, the set of nodes 

{n\vG A(n)} induces a connected subtree of T. 

The width of a tree decomposition (T, X) of H = {V, H), 
withT = F^), is dehned as 77iaa;{|A(n)| —1 : uGN}. 

The tree-width tw {H) of H is the minimum width over all 
its tree decompositions. 


As it is common, we will assume that | Obs\, i.e. the number 
of atoms in the conjunction, is bounded above by a numer¬ 
ical parameter p. It is common that p = 1 (a single atomic 
observation). 

Cefinition 12.21 suggests that we are interested in the data 
complexity of the relevance problem for Catalog abduction. 
That is, the Catalog program is hxed and hypotheses and 
input structure may change and maybe regarded as data. 
In contrast, under combined complexity the program is also 
part of the input, and the complexity is measured also in 
terms of the program size. 


The following result is obtained by showing that the NP- 
complete combined complexity of the relevance problem 
for Propositional Cata log Abduction (PCA) (established in 
( Friedrich et al. , T990l) ). coincides with the data complexity 
of the relevance problem for (non-proposition al) Catalog 


Abdu ction. For this, techniques developed in (lEiter et al. 
Il997h can be used. 


Proposition 2.1. For every Catalog program If, 
'RCVVilf) S NP, and there are programs If' for 
which TZCDVijE) is AP-hard. □ 


It is clear from this result that the combined complexity 
of deciding relevance for Catalog abduction is also in¬ 
tractable. However, a tractable ca s e of co mbined complex¬ 
ity is identihed in ( Gottlob et al. , 2010 bh . on the basis of 


the notions of tree-decomposition and bounded tree-width, 
which we now briefly present. 


Let H = {V, H) be a hypergraph. V is the set of vertices, 
and H the set of hyperedges, i.e. of subsets of V. A tree- 
decomposition T of His a pair (T, A), where T = {N, E) 
is a tree and A is a labeling function that assigns to each 
node n G N, a subset A(n) of V (X(n) is aka. bag), i.e. 
A(n) C V, such that, for every node n G N, the following 
hold: (a) For every v G V , there exists n G N with 
V G X(n). (b) For every h G H, there exists a node n G N 


Intuitively, the tree-width of a hypergraph H is a mea¬ 
sure of the “tree-likeness” of H. A set of vertices that 
form a cycle in H are put into a same bag, which be¬ 
comes (the bag of a) node in the corresponding tree- 
decomposition. If the tree-width of the hypergraph un¬ 
der consideration is bounded by a hxed constant, then 
many otherwise intr actable problems become tractable 


(IGottlob et al.Ll2010ah . 


It is possible to associate an hypergraph to any hnite struc¬ 
ture D (think of a relational database): If its universe 
(the active domain in the case of a relational database) is 
V, dehne the hypergraph H{D) = (y,H), with H = 
{ {oi, ..., On} I D contains a ground atom P{ai ... an) 
for some predicate symbol P}. 


Example 2.1. Consider instance D in Example 11.11 
The hypergraph H{D) associated to D is shown in 
EiguredJa). Its vertices are the elements of adom{D) = 
{John, done, Tom, TODS, TKDE, XML, Cube, 30, 31, 
32}, the active domain of D. Eor example, since 
Journal{ TKDE, XML, 30) G D, { TKDE, XML, 30} is 
one of the hyperedges. 

The dashed ovals show four sets of vertices, i.e. hyper¬ 
edges, that together form a cycle. Their elements are put 
into the same bag of the tree-decomposition. Eigure [Hb) 
shows a possible tree-decomposition of H{D). In it, the 
maximum |A(n)| — 1 is 6 — 1, corresponding to the top box 
bag of the tree. So, tw{H{D)) <5. □ 


The following is a fixed-parameter tractability result for 
the relevance decision problem for Datalog abduction prob¬ 
lems with a program H that is guarded, which means that 
in every rule body there is an atom that contains (guards) 
all the variables appearing in that body. 


Theorem 2.2. ( Gottlob et al.L 2010bh Let k be an integer. 
Eor Datalog abduction problems AV = (If, E, Hyp, Obs) 
where H is guarded, and tw{H{E)) < k, rele- 

































vance can be decided in polynomial time in |^7^|0 
More precisely, the decision problem; TZCDV = 
{((n, E, Hyp, Ohs), h) \ h € Rel{{Il, E, Hyp, Ohs)), h € 
Hyp, n is guarded, and tyj{T-L{E)) < k} is tractable. □ 


In order to represent Catalog abduction in terms of query- 
answer causality, we show that abductive diagnoses from 
Catalog programs are formed essentially by actual causes 
for the observation. 


This is a case of tractable combined complexity with a fixed 
parameter that is the tree-width of the extensional database. 

2.2 QUERY CAUSALITY FROM ABDUCTIVE 
DIAGNOSIS 

In this section we first show that, for the class of Catalog 
theories (system specihcations), abductive inference corre¬ 
sponds to actual causation for monotone queries. That is, 
abductive diagnoses for an observation essentially contain 
actual causes for the observation. 

Assume that If is a boolean, possibly recursive Catalog 
query. Consider the relational instance D — U D^. 
Also assume that If U I? ^ ans. So, the decision problem 
in Cehnition ll.2l takes the form CWili) := {{D, r) | r € 
C", and r € Causes{D,Il)}. 

We now show that actual causes for ans can be obtained 
from abductive diagnoses of the associated causal Datalog 
abduction problem (CDAP): AV'^ := (A, , ans), 

where is the extensional database for If (and then If U 
becomes the background theory), Z?" becomes the set 
of hypothesis, and atom ans is the observation. 

Proposition 2.3. t € I?" is an actual cause for ans iff 
t € RdiAV^'). □ 

Example 2.2. Consider the instance D with relations R 
and S as below, and the query If ; ans R{x, y), S{y), 
which is true in D. Assume all tuples are endogenous. 


R 

X 

Y 

S 

X 


ai 

a4 


Ol 


02 

Ol 


02 


03 

03 


03 


AV‘^ = (J1,%,D, ans) has two (subset-minimal) abduc¬ 
tive diagnoses: Ai = {5'(ai), i?(a 2 , oi)} and A 2 = 
{S'(a 3 ),i?(a 3 ,a 3 )}. Then, Rel{AV) = {S'(a 3 ), 

H{a 3 , 03 ), S{ai), R{a 2 , oi)}- It is easy to see that the rel¬ 
evant hypothesis are actual causes for ans. □ 

We are interested in obtaining responsibilities of actual 
causes for ans. 


More precisely, consider a Catalog abduction problem 
AV = (n, E, Hyp, Ohs), where E is the underlying exten¬ 
sional database, and Ohs is a conjunction of ground atoms. 

Now we construct a query-causality setting; D := U 
Z?”, := E, and ZZ” := Hyp. Consider the program 

n' := n U {ans ■(— Ohs} (with ans a fresh propositional 
atom). So, n' is seen as a monotone query on D. 

Proposition 2.5. A hypothesis h is relevant for AV, i.e. 
h G Rei(AV), iff h is an actual cause for ans wrt. If', D. 
□ 


Now we will use the results obtained so far in this section to 
obtain new complexity results for Catalog query causality. 
Actually, the following result is obtained from Propositions 
Oandl23] 

Proposition 2.6. For boolean Catalog queries If, CVViJl) 
is AP-complete (in data). □ 


This result should be c ontrasted with the tractab ility of 
same problem for UCQs ( Salimi & Bertossil 2015h . 


We now introduce a fixed-parameter tractable case of this 
problem. For this we take advantage of the tractable case of 
Catalog abduction presented in Section IZTI The following 
is a consequence of Theorem l2.2l and Proposition 12. 3 1 

Proposition 2.7. For guarded Catalog queries If and a ex¬ 
tensional instances D = U ZZ”, with ZZ® of bounded 
tree-width, CDP is fixed-parameter tractable in combined 
complexity, with the parameter being the tree-width bound. 

□ 


3 VIEW-UPDATES AND QUERY 
CAUSAUITY 


There is a close relationship between query causality 
and the view-update problem in the form of delete- 
propagation, which was firs t suggested in dKimelfeld . 
2012; Kimelfeld et al.[ 2012h (see also ( Buneman et ah . 


2002^ 1. We start by formalizing some specific computa¬ 


tional problems related to the general delete-propagation 
problem. 


Definition 2.3. Given a CCAP, AV^" = (H, D^,D^, ans), 
with Sol{AV‘^) ^ 0, iV C ZZ" is 3. necessary-hypothesis 
set if N is subset-minimal such that Sol{AV%) = 0, with 
AV% ■.= {n,D^,D'^ \N,ans). □ 

Proposition 2.4. The responsibility of a tuple t for ans is 
1^, where ZV is a necessary-hypothesis set with minimum 
cardinality for and t G N. □ 


3.1 DELETE-PROPAGATION PROBLEMS 

Given a monotone query Q, we can think of it as defin¬ 
ing a view with virtual contents Q{D). If a G Q{D), 
which may not be intended, we may try to delete some tu¬ 
ples from D, so that a disappears from Q{D). This is a 
comm on case of the problem of database updates through 
views ( Abiteboul et al.L 1995h . In this work we consider 
some variations of this problem, in both their functional 
and the decision versions. 


®This is Theorem 7.9 in dGottlob et alll2010bh . 
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Definition 3.1. For an instance D, and a monotone query 

Q: 


(a) For a € Q{D), the minimal source-side-effect problem 
is about computing a subset-minimal A C D, such that 
d i Q{D \ A). 

(b) The minimal source-side-effect decision problem is 
(deciding about the membership of): 

MSS£V\Q)_ = {{D,D',d) I d G Q{D), D' C D, 
d ^ Q{D'), and D' is subset-maximal}. 
(The superscript s stands for subset-minimal.) 

(c) For d G Q{D), the minimum source side-effect prob¬ 
lem is about computing a minimum-cardinality A C 
D, such that d ^ Q{D \ A). 

(d) The minimum source side-effect decision problem is 
(deciding about the membership of): 

MSS£V\Q) = {(£>,£>', d) I d G Q{D),D' C D, 
d ^ Q{D'), and D' has maximum cardinality}. 
(Here c stands for minimum cardinality.) □ 


Definition 3.2. (iBuneman et al.L 120021) For an instance 
and a monotone query Q: 


(a) For a G Q{D), the view side-effect-free problem is 
about computing a A C D, such that Q{D) \ {d} = 
Q{D \ A). 

(b) The view side-effect-free decision problem is (deciding 
about the membership of): 

VS£IFV{Q) = {{D,d) I d G Q{D), and exists 

D' CD with Q{D) \ {d} = QiD')}. □ 


Proposition 3.1. Consider an instance D, a view V defined 
by a monotone query Q, and d G V{D): D' C D is 
a solution to the minimal source-side-effect problem, i.e. 
{D,D',d) G MSS£V\Q), iff there isat C D-^D', 
such that t G Causes{D, Q{d)) and D \ {D' U {t}) G 
Cont{D, Q(d),t). □ 

Now we show that, in order to minimize the side-effect on 
the source (cf. Definition B.ll c')'). it is good enough to pick 
a most responsible cause for d with any of its minimum- 
cardinality contingency sets. 

Proposition 3.2. Consider an instance D, a view V defined 
by a monotone query Q, and d G V{D): D' C D is 
a solution to the minimum source-side-effect problem, i.e. 
{D,D',d) G MSS£V%Q), iff there is a f G £> \ O', 
such that t G MTZC{D, Q(d)), A := O \ (O' U {f}) G 
Cont{D, Q(d), f), and there is no A' G Cont{D, Q(d), t) 
with I A'I < |A|. □ 

Next, we show that in order to check if there exists a so¬ 
lution to the view side-effect-free problem for d G V(0) 
(cf. Definition 13.2b . it is good enough to check if d has a 
view-conditioned causeP°l 

Proposition 3.3. Consider an instance O, a view V defined 
by a monotone query Q, and d G V(0): There is a solution 
to the view side-effect-free problem for d, i.e. {D, d) G 
VS£IF'P{Q), iff vc-Causes(D, Q{d)) ^0. □ 

Example 3.1. (ex. o cont.) Consider the same instance 
D, query Q, and answer ( John, XML). 

Consider the following sets of tuples: 


3.2 VIEW DELETIONS VS. CAUSES 


Si={ Author(John, TKDE), Journal{TODS, XML, 32)}, 


In this section we first establish mutual reductions between 
the different variants of the delete propagation problem 
and both query and view-conditioned causality. On this 
basis, we obtain next some complexity results for view- 
conditioned causality and the minimum source-side-effect 
problem. 

In this section all tuples in the instances involved are as¬ 
sumed to be endogenous. Consider a relational database 
D, a view V defined by a monotone query Q. So, the vir¬ 
tual view extension, V{D), is Q{D). 

For a tuple d G V{D), the delete-propagation problem, in 
its most general form, is the task of deleting a set of tuples 
from D, and so obtaining a subinstance D' of D, such that 
d ^ V{D'). It is natural to expect that the deletion of d 
from the view can be achieved through deletions from D 
of the causes for d to be in the view extension. However, 
to obtain solutions to the different variants of this problem 
introduced in Section 13.11 different sets of actual causes 
must be considered. 

First, we show that an actual cause for d to be in V{D) 
forms, with any of its contingency sets, a solution to the 
minimal source-side-effect problem (cf. Definition l3.1b . 


S' 2 ={ Author(John, TODS), Journal(TKDE, XML, 30)}, 

5'3={ Joumal(TODS, XML, 30), Journal(TKDE, XML, 30)}, 

5'4={ Author(John, TODS), Author(John, TKDE)}. 

Each of the subinstances D \ Si, i = 1,..., 4, is a solu¬ 
tion to both the minimum and minimal source-side-effect 
problems. These solutions essentially contain the actual 
causes for answer ( John, XML), as computed in Exam¬ 
ple o Moreover, there is no solution to the view side- 
effect-free problem associated to this answer, which coin¬ 
cides with the result obtained in Examr)le ll.21 and confirms 
Proposition [33] □ 

Now we show, the other way around, that actual causes, 
most responsible causes, and VC causes can be ob¬ 
tained from solutions to different variants of the delete- 
propagation problem. 

Eirst, we show that actual causes for a query answer can be 
obtained from the solutions to the corresponding minimal 
source-side-effect problem. 

*°Since this proposition does not involve contingency sets, the 
existential problem in Definition 13.21 b) is the right one to con¬ 
sider. 






Proposition 3.4. Consider an instance D, a view V defined 
by a monotone query Q, and a S V(-D); Tuple t is an 
actual cause for a iff there is a D' C D with t € {D \ 
D') C D" and (D, D', d) £ MSSEV\Q). □ 

Similarly, most-responsible causes for a query answer can 
be obtained from solutions to the corresponding minimum 
source-side-effect problem. 

Proposition 3.5. Consider an instance D, a view V defined 
by a monotone query Q, and a £ V{D): Tuple f is a most 
responsible actual cause for a iff there is a D' C D with 
tGiD\D')C D" and {D, D', a) £ MSS 8 V\Q). □ 

Finally, VC-causes for an answer can be obtained from so¬ 
lutions to the view side-effect-free problem. 

Proposition 3.6. Consider an instance D, a view V defined 
by a monotone query Q, and a £ V{D): Tuple f is a VC- 
cause for a iff there isa D' C D with f £ {D\D') C £)" 
and D' is a solution to the view side-effect-free problem 
associated to a. □ 


The partition of a database into endogenous and exoge¬ 
nous tuples used in causality may also be of interest in the 
context of delete propagation. It makes sense to consider 
endogenous delete-propagation that are obtained through 
deletions on endogenous tuples only. Actually, given an in¬ 
stance D — D" U a view V defined by a monotone 
query Q, and d £ V{D), endogenous delete-propagations 
for d (in all of its flavors) can be obtained from actual 
causes for d from the partitioned instance. 

Example 3.2. (ex. 13.11 cont.) Consider again that 
tuple ( John, xml) must be deleted from the query re¬ 
sult; and assume now the data in Journal is reliable. 
Therefore, only deletions from Author make sense. This 
can be captured by considering Journal-twpiss as exoge¬ 
nous and Author-twpiss as endogenous. With this parti¬ 
tioning, only Author(John, TODS) and Author(John, TKDE) 
are actual causes for ( John, XML), and each of them 
forms a singleton and unique contingency set of the other 
as a cause (See Exampleexxfexl). Therefore, D \ 
{Author(John, TODS), Author(John, TKDE)} is a solution to 
the associated minimal- and minimum endogenous delete- 
propagation of ( John, xml) . □ 


We now investigate the complexity of the view-conditioned 
causality problem (cf. Definition ! 1.61 l. For this, we take ad¬ 
vantage of the connection between VC-causality and the 
view side-effect-free problem. Actually, the following re¬ 
sult is obtained fro m the AP-completene ss of view side- 
effect-free problem (iBuneman et akl 120021) and Proposition 


Proposition 3.7. For CQs, the view-conditioned causality 
decision problem, VCV, is AP-complete. □ 


Actually, this result also holds for UCQs. The next result is 
obtained from the EF’^^^^°®^"^^-completeness of comput¬ 
ing the re sponsibility of the most responsible causes (ob¬ 
tained in ( Salimi & Bertossil 1201511 1 and Proposition |32] 


Proposition 3.8. Computing the size of a solution to 
the minimum source-side-effect problem is FP 
hard. 


NP{log{n))_ 
□ 


As mentioned in Section [TI] responsibility computation 
(more precisely the RDP problem in Definition 11.31 1 is 
tractable for weakly linear queries. We can take advan¬ 
tage of this result and obtain, via Proposition 13.2! a new 
tractability result for the minimum source-side-effect prob- 
le m, which has been show n to be AP-hard for general CQs 
in ( Buneman et al. . 2002h . 

Proposition 3.9. For weakly linear queries, the minimum 
source-side-effect decision problem is tractable. □ 

The class of weakly linear queries generalizes that of linear 
queries (cf. Section fm i. So, Proposition |T9] also holds for 
linear queries. 


In (IBuneman et akl 12002!) it has been shown that the min¬ 
imum source-side-effect decision problem is tractable for 
the class of project-join queries with chain joins. Now, a 
join on k atoms with different predicates, say i?i,..., Rk, is 
a chain join if there are no attributes (variables) shared by 
any two atoms Ri and Rj with j > i 1. That is, only 
consecutive relations may share attributes. For example, 
3xvyu{A{x) A Si {x, v) A S 2 {v, y) A R{y, u) A S 3 {y, z)) is 
a project-join query with chain joins. 

We observe that project-join queries with chain joins cor¬ 
respond linear queries. Actually, the tractability results 
for these classes of queries are bo th obtained v i a a re¬ 
duction to max i mum flow problem ( Meliou et al.L 2010a : 
Buneman et al.l 2002). As a con sequence, the result i n 


Proposition 13.91 extends that in (IBuneman et akl l2002h . 
from linear queries to weakly-linear queries. For example, 
3xyz{R{x, y)AS{y, z)AT{z, x)AV{x)) is n ot linear (then 


nor with chain joins), but it is weakly linear ( Meliou et al 
l2010ah . 

4 CONCLUSIONS 

We have related query causality to abductive diagnosis and 
the view-update problem. Some connections between the 
last two have been established before. More precisely, the 
view-update problem has bee n treated from the point of 
view of abductive re asoning ( Kakas & Mancarellal 1990l: 
Console et akl 1995 ). The idea is to “abduce” the pres¬ 
ence of tuples in the base tables that explain the presence 
of those tuples in the view extension that one would like, 
e.g. to get rid of. 

In combination with the results reported in 
( Salimi & Bertossil 2015 ). we can see that there are 
deeper and multiple connections between the areas of 
query causality, abductive and consistency-based diagno¬ 
sis, view updates, and database repairs. Results for any of 
these areas can be profitably applied to the others!*^ 


'Connections between consistency-based and abductive diag- 






































We point out that database repairs are related to the view- 
update problem. Ac tually, answer set p rograms (ASPs) 
( Brewka et ah . 201 lb for database repairs ( Bertossi . 2011 ) 
implicity repair the database by updating conjunctive com¬ 
binations of intentional, annotated predicates. Those logi¬ 
cal combinations -views after all- capture violations of in¬ 
tegrity constraints in the original database or along the (im¬ 
plicitly iterative) repair process (a reason for the use of an¬ 
notations). 

Even more, in ( Bertossi & Li . 2013h . in order to protect 
sensitive information, databases are explicitly and virtually 
“repaired” through secrecy views that specify the informa¬ 
tion that has to be kept secret. In order to protect infor¬ 
mation, a user is allowed to interact only with the virtually 
repaired versions of the original database that result from 
making those views empty or contain only null values. Re¬ 
pairs are specihed and computed using ASP, and a n explicit 
conne ction to prioritized attribute-b ased repairs (IBertossi , 
2011) is made ( Bertossi & Lil 2013 ). 


Finally, we should note that abdu ction has also been explic- 
itly applied to database repairs ( Arieli et^kl 20041) . The 
idea, again, is to “abduce” possible repair updates that 
bring the database to a consistent state. 
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